High definition bubbles for rendering free viewpoint video

ABSTRACT

A “Dynamic High Definition Bubble Framework” allows local clients to display and navigate FVV of complex multi-resolution and multi-viewpoint scenes while reducing computational overhead and bandwidth for rendering and/or transmitting the FVV. Generally, the FVV is presented to the user as a broad area from some distance away. Then, as the user zooms in or changes viewpoints, one or more areas of the overall area are provided in higher definition or fidelity. Therefore, rather than capturing and providing high definition everywhere (at high computational and bandwidth costs), the Dynamic High Definition Bubble Framework captures one or more “bubbles” or volumetric regions in higher definition in locations where it is believed that the user will be most interested. This information is then provided to the client to allow individual clients to navigate and zoom different regions of the FVV during playback without losing fidelity or resolution in the zoomed areas.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under Title 35, U.S. Code, Section119(e), of a previously filed U.S. Provisional Patent Application, Ser.No. 61/653,983 filed on May 31, 2012, by Simonnet, et al., and entitled“INTERACTIVE SPATIAL VIDEO,” the subject matter of which is incorporatedherein by reference.

BACKGROUND

In general, in free-viewpoint video (FVV), multiple video streams areused to re-render a time-varying scene from arbitrary viewpoints. Thecreation and playback of a FVV is typically accomplished using asubstantial amount of data. In particular, in FVV, scenes are generallysimultaneously recorded from many different perspectives using sensorssuch as RGB cameras. This recorded data is then generally processed toextract 3D geometric information in the form of geometric proxies ormodels using various 3D reconstruction (3DR) algorithms. The originalRGB data and geometric proxies are then recombined during rendering,using various image based rendering (IBR) algorithms, to generatemultiple synthetic viewpoints.

Unfortunately, when a complex FVV such as a football game is recorded orotherwise captured, rendering the entire volume of the overall capturearea to generate the FVV generally uses a very large dataset and acorrespondingly large computational overhead for rendering the variousviewpoints of the FVV for viewing on local clients.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter. Further, while certain disadvantages of prior technologies maybe noted or discussed herein, the claimed subject matter is not intendedto be limited to implementations that may solve or address any or all ofthe disadvantages of those prior technologies.

In general, a “Dynamic High Definition Bubble Framework” as describedherein provides various techniques that allow local clients to displayfree viewpoint video (FVV) of complex 3D scenes while reducingcomputational overhead and bandwidth for rendering and/or transmittingthe FVV. These techniques allow the client to perform spatial navigationthrough the FVV, while changing viewpoints and/or zooming into one ormore higher definition regions or areas (specifically defined andreferred to herein as “high definition bubbles”) within the overall areaor scene of the FVV.

More specifically, the Dynamic High Definition Bubble Framework enableslocal rendering of FVV by providing a lower fidelity geometric proxy ofan overall scene or viewing area in combination with one or more higherfidelity geometric proxies of the scene corresponding to regions ofinterest (e.g., areas of action in the scene that the user may wish toview in expanded detail and from one or more different viewpoints). Thisallows the user to view the entire volume of the scene as FVV, withinteresting features or regions of the scene being provided in higherdetail and optionally from a plurality of user-selectable viewpoints,while reducing the amount of data that is transmitted to the client forlocal rendering of the FVV. Note that the high definition bubbles mayhave differing levels of resolution or fidelity levels as well asdiffering numbers of viewpoints. Further, some of these viewpoints maybe available at different resolutions or fidelity levels even within thesame high definition bubble.

The Dynamic High Definition Bubble Framework enables these capabilitiesby providing multiple areas or sub-regions of higher definition videocapture within the overall viewing area or scene. One implementation ofthis concept is to use multiple cameras (e.g., a camera array or thelike) surrounding the scene to capture the scene or event holistically,in whatever resolution is desired. Concurrently, a set of cameras (e.g.,a camera array or the like) that zoom in on particular regions ofinterest within the overall scene are used to create higher definitiongeometric proxies that enable a higher quality viewing experience of“bubbles” associated with the zoomed regions of the scene.

For example, various embodiments of the Dynamic High Definition BubbleFramework are enabled by using captured image or video data to create a3D representation (or other visual representation of the “real” world)of the overall space of a scene. One or more sub-regions (i.e., highdefinition bubbles) of the larger space of the overall scene are thentransferred to the client as high definition geometric proxies while theremaining areas of the overall scene are transferred to the client usinglower resolution geometric proxies. Advantageously, the sub-regionsrepresented by the high definition bubbles can be in fixed or predefinedpositions (e.g., the end zone of football field) or can move within thelarger area of the overall scene (e.g., camera arrays following a ballor a particular player in a soccer game). These high definition bubblesare enabled by using any desired combination of fixed and moving cameraarrays to capture high-resolution image data within one or more regionsof interest relative to the area of the overall scene.

Captured image data is then used to generate geometric proxies or 3Dmodels of the scene for local rendering of the FVV from any availableviewpoint and at any desired resolution corresponding to the selectedviewpoint. Note also that the FVV can be pre-rendered and sent to theclient as a viewable and navigable FVV.

In particular, when used to stream 3D geometric proxies or models andcorresponding RGB data to the client for locally render the FVV, thetechniques enabled by the Dynamic High Definition Bubble Framework serveto reduce the amount of data used to render a specific viewpoint andresolution selected by the user when viewing or navigating the FVV. Thisapproach is also applicable to server side rendering performance, when avideo frame is generated on the server and transmitted to the client. Inthe server side example, using lower fidelity representations of areasthat are far away from a region of interest (i.e., the desiredviewpoint) in combination with using higher fidelity representations ofthe regions of interest reduces the time and computational overheadneeded for generating video frames prior to transmission to the client.

In other words, in various embodiments, the Dynamic High DefinitionBubble Framework creates a navigable FVV that presents a general orremote view (e.g., relatively far back from the action) of an overallvolumetric space and then chooses an optimal dataset to use to rendervarious portions of the FVV at the desired resolutions/fidelity. Thisallows the Dynamic High Definition Bubble Framework to seamlesslysupport varying resolutions for different regions while optimallychoosing the appropriate dataset to process for the desired output.Advantageously, rendering regions within the high definition bubblesusing higher resolutions allows the user to zoom into those regionswithout creating pixelization artifacts or other zoom-based viewingproblems. In other words, even though the user is zooming intoparticular areas or regions, the FVV displayed to the user does not losefidelity or resolution in those zoomed areas.

In view of the above summary, it is clear that the Dynamic HighDefinition Bubble Framework described herein provides various techniquesthat allow local clients to display and navigate FVV of complexmulti-resolution and multi-viewpoint scenes while reducing computationaloverhead and bandwidth for rendering and/or transmitting the FVV. Inaddition to the just described benefits, other advantages of the DynamicHigh Definition Bubble Framework will become apparent from the detaileddescription that follows hereinafter when taken in conjunction with theaccompanying drawing figures.

DESCRIPTION OF THE DRAWINGS

The specific features, aspects, and advantages of the claimed subjectmatter will become better understood with regard to the followingdescription, appended claims, and accompanying drawings where:

FIG. 1 provides an exemplary architectural flow diagram that illustratesprogram modules for using a “Dynamic High Definition Bubble Framework”for creating and navigating free viewpoint videos (FVV) of complexmulti-resolution and multi-viewpoint scenes while reducing computationaloverhead and bandwidth for rendering and/or transmitting the FVV toclients, as described herein.

FIG. 2 provides an illustration of high definition bubbles within anoverall viewing area or scene, as described herein

FIG. 3 provides illustration of the use of separate camera arrays tocapture a high definition bubble and an overall viewing area, asdescribed herein.

FIG. 4 illustrates a general system flow diagram that illustratesexemplary methods for implementing various embodiments of the DynamicHigh Definition Bubble Framework for creating and navigating FVV'shaving high definition bubbles, as described herein.

FIG. 5 is a general system diagram depicting a simplifiedgeneral-purpose computing device having simplified computing and I/Ocapabilities for use in implementing various embodiments of the DynamicHigh Definition Bubble Framework, as described herein.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description of the embodiments of the claimed subjectmatter, reference is made to the accompanying drawings, which form apart hereof, and in which is shown by way of illustration specificembodiments in which the claimed subject matter may be practiced. Itshould be understood that other embodiments may be utilized andstructural changes may be made without departing from the scope of thepresently claimed subject matter.

1.0 Introduction:

Note that some or all of the concepts described herein are intended tobe understood in view of the overall context of the discussion of“Interactive Spatial Video” provided in U.S. Provisional PatentApplication, Ser. No. 61/653,983 filed on May 31, 2012, by Simonnet, etal., and entitled “INTERACTIVE SPATIAL VIDEO,” the subject matter ofwhich is incorporated herein by reference.

Note that various examples discussed in the following paragraphs referto football games and football stadiums for purposes of explanation.However, it should be understood that the techniques described hereinare not limited to any particular location, any particular activities,any particular size of volumetric space, or any particular number ofscenes or objects.

In general, when a complex free-viewpoint video (FVV) of 3D scenes isrecorded, one or more overall capture areas typically surround the“action”, which is confined to one or more smaller volumetric areas orsub-regions within the overall capture area. For example, in a footballgame, the size of the field is relatively large, but at any given time,the interesting action is generally centered on the ball and one or moreplayers or athletes around the ball. While it is technically feasible tocapture and render the entire capture volume at full fidelity, thiswould typically result in the generation of very large datasets to besent from the server to the client for local rendering.

Advantageously, a “Dynamic High Definition Bubble Framework,” asdescribed herein, provides various techniques that specifically addresssuch concerns by providing the client with one or more lower fidelitygeometric proxies of an overall viewing area or volumetric space.Concurrently, the Dynamic High Definition Bubble Framework provides oneor more sub-regions of the overall viewing area as higher fidelityrepresentations. Local clients then use this information to view andnavigate through the overall FVV while providing the user with thecapability to zoom into areas of higher fidelity. In other words, theDynamic High Definition Bubble Framework provides various techniquesthat allow local clients to display and navigate FVV of complexmulti-resolution and multi-viewpoint scenes while reducing computationaloverhead and bandwidth for rendering and/or transmitting the FVV.Advantageously, rendering regions within the high definition bubblesusing higher resolutions allows the user to zoom into those regionswithout creating pixelization artifacts or other zoom-based viewingproblems. In other words, even though the user is zooming intoparticular areas or regions, the FVV displayed to the user does not losefidelity or resolution in those zoomed areas.

More specifically, the Dynamic High Definition Bubble Framework enableslocal rendering of image frames of the FVV by providing a lower fidelitygeometric proxy of an overall scene in combination with one or morehigher fidelity geometric proxies of the scene corresponding to regionsof interest (e.g., areas of action in the scene that the user may wishto view in expanded detail). This allows the user to view the entirevolume of the scene as FVV, with interesting features or regions of thescene being provided in higher detail in the event that the user zoomsinto such regions, while reducing the amount of data that is transmittedto the client for local rendering of the FVV.

One implementation of this concept is to use multiple cameras (e.g.,camera arrays or the like) surrounding the scene to capture the scene orevent holistically, in whatever resolution is desired. Concurrently, aset of cameras that zoom in on particular regions of interest within theoverall scene (such as the “action” in a football game where a player iscarrying the ball) are used to capture data for creating higherdefinition geometric proxies that enable a higher quality viewingexperience of “bubbles” associated with the zoomed regions of the scene.These bubbles are specifically defined and referred to herein as “highdefinition bubbles.”Further, depending upon the available camera data,multiple viewpoints of potentially varying resolution or fidelity may beavailable within each bubble.

For any given scenario (e.g., sporting events, movie scenes, concerts,etc.), the Dynamic High Definition Bubble Framework typically presents abroad view of the overall viewing area or volumetric space from somedistance away. Then, as the user zooms in or changes viewpoints, one ormore areas of the overall scene or viewing area are provided in higherdefinition or fidelity. Therefore, rather than providing high definitioneverywhere (at high computational and bandwidth costs), the Dynamic HighDefinition Bubble Framework captures one or more bubbles in higherdefinition in locations or regions where it is believed that the userwill be most interested. In other words, an author of the FVV will usethe Dynamic High Definition Bubble Framework to capture bubbles inplaces where it is believed that user's may want more detail, or wherethe author want user's to be able to explore the FVV in greater detail.

Bubbles can be presented to the user in various ways. For example, indisplaying the FVV to the user, the user is provided with the capabilityto zoom and/or change viewpoints (e.g., pans, tilts, rotations, etc.).In the case that the user zooms into a region corresponding to a highdefinition bubble, the user will be presented with higher resolutionimage frames during the zoom. As such, there is no need to demarcateexplicit regions of the FVV that contain high definition bubbles.

In other words, the user is presented with the entire scene and as theyscroll through it, more data is available in areas (i.e., bubbles) wherethere is higher detail. For example, by zooming into a high definitionbubble around a football, the user will see that there is more detailavailable to them, while if they zoom into the grass near the edge of afield where there is less action, the user will see less detail(assuming that there is no corresponding high definition bubble there).Therefore, by placing bubbles in areas where the user is expected tolook for higher detail (such as a tight view in and around the ball whenit is fumbled) detail available to the user is higher, while off to oneside of the field distant from the play, it is unlikely the user willzoom into that area. Therefore, when the user does zoom into the areaaround the ball, it creates an illusion as if the user can zoom inanywhere.

In alternate embodiments of the Dynamic High Definition BubbleFramework, the FVV is presented with thumbnails or highlighting withinor near the overall scene to alert the user as to locations, regions orbubbles (and optionally available viewpoints) of higher definition. Forexample, the Dynamic High Definition Bubble Framework can provide a FVVof a boxing match where the overall ring is in low definition, but thetwo fighters are within a high definition bubble. In this case, the FVVmay include indications of either or both the existence of the highdefinition bubble around the fighters and various available viewpointswithin that bubble such as a view of the opponent from either boxersperspective.

Advantageously, the Dynamic High Definition Bubble Framework allowsdifferent users to have completely different viewing experiences. Forexample, in the case of a football game, one user can be zoomed into abubble around the ball, while another user is zoomed into a bubblearound cheerleaders on the edge of the football field, while yet anotheruser is zoomed out to see the overall action on the entire field.Further, the same user can watch the FVV multiple times using any of anumber of available zooms into one or more high definition bubbles andfrom any of a number of available viewpoints relative to any of thosehigh definition bubbles.

1.1 System Overview:

As noted above, the “Dynamic High Definition Bubble Framework,” providesvarious techniques that allow local clients to display and navigate FVVof complex multi-resolution and multi-viewpoint scenes while reducingcomputational overhead and bandwidth for rendering and/or transmittingthe FVV. The processes summarized above are illustrated by the generalsystem diagram of FIG. 1. In particular, the system diagram of FIG. 1illustrates the interrelationships between program modules forimplementing various embodiments of the Dynamic High Definition BubbleFramework, as described herein. Furthermore, while the system diagram ofFIG. 1 illustrates a high-level view of various embodiments of theDynamic High Definition Bubble Framework, FIG. 1 is not intended toprovide an exhaustive or complete illustration of every possibleembodiment of the Dynamic High Definition Bubble Framework as describedthroughout this document.

In addition, it should be noted that any boxes and interconnectionsbetween boxes that may be represented by broken or dashed lines in FIG.1 represent alternate embodiments of the Dynamic High Definition BubbleFramework described herein, and that any or all of these alternateembodiments, as described below, may be used in combination with otheralternate embodiments that are described throughout this document.

In general, as illustrated by FIG. 1, the processes enabled by theDynamic High Definition Bubble Framework begin operation by using a datacapture module 100 that uses multiple cameras or arrays to capture andgenerate 3D scene data 120 (e.g., geometric proxies, 3D models, RGB orother color space data, textures, etc.) for an overall viewing area andone or more viewpoints for one or more high definition bubbles withinthe overall viewing area.

In various embodiments, a user input module 110 is used for variouspurposes, including, but not limited to, defining and configuring one ormore cameras and/or camera arrays for capturing an overall viewing areaand one or more high definition bubbles. The user input module 110 isalso used in various embodiments to define or specify one or more highdefinition bubbles, one or more viewpoints or view frustums, resolutionor level of detail for one or more of the bubbles and one or more of theviewpoints, etc.

Typically, local clients will render video frames of the FVV from 3Dscene data 120. However, in various embodiments, a pre-rendering module130 uses the 3D scene data 120 to pre-render one or more FVV's that arethen provided to one or more clients for viewing and navigation. Ineither case, a data transmission module 140 transmits either thepre-rendered FVV or 3D scene data 120 to one or more clients. TheDynamic High Definition Bubble Framework conserves bandwidth whentransmitting to the client by only sending sufficient 3D scene data 120for the level of detail desired to render image frames corresponding toan initial virtual navigation viewpoint or viewing frustum or oneselected by the client. Following receipt of the 3D scene data 120,local clients use a local rendering module 150 to render one or moreFVV's 160 or image frames of the FVV.

Finally, a FVV playback module 170 provides user-navigable interactiveplayback of the FVV in response to user navigation and zoom commands. Ingeneral, the FVV playback module 170 allows the user to pan, zoom, orotherwise navigate through the FVV. Further, user pan, tilt, rotationand zoom information is provided back to the local rendering module 150or to the data transmission module for use in retrieving the 3D scenedata 120 needed to render subsequent image frames of the FVVcorresponding to user interaction and navigation through the FVV.

2.0 Operational Details:

The above-described program modules are employed for implementingvarious embodiments of the Dynamic High Definition Bubble Framework. Assummarized above, the Dynamic High Definition Bubble Framework providesvarious techniques that allow local clients to display FVV of complexscenes while reducing computational overhead and bandwidth for renderingand/or transmitting the FVV.

The following sections provide a detailed discussion of the operation ofvarious embodiments of the Dynamic High Definition Bubble Framework, andof exemplary methods for implementing the program modules described inSection 1 with respect to FIG. 1. In particular, the following sectionsprovides examples and operational details of various embodiments of theDynamic High Definition Bubble Framework, including: an operationaloverview of the Dynamic High Definition Bubble Framework; exemplary FVVscenarios enabled by the Dynamic High Definition Bubble Framework; anddata capture scenarios and FVV generation.

2.1 Operational Overview:

As noted above, the Dynamic High Definition Bubble Framework-basedprocesses described herein provide various techniques that allow localclients to display and navigate FVV of complex multi-resolution andmulti-viewpoint scenes while reducing computational overhead andbandwidth for rendering and/or transmitting the FVV.

FIG. 2 illustrates various high definition bubbles within an overallviewing area 200, scene, or volumetric space. The Dynamic HighDefinition Bubble Framework generally uses various cameras or cameraarrays to capture the overall viewing area 200 at some desiredresolution level. One or more high definition bubbles within the overallviewing area 200 are then captured uses various cameras or camera arraysat higher resolution or fidelity levels. As illustrated by FIG. 2, thesehigh definition bubbles (e.g., 210, 220, 230, 240, 250 and 260) can havearbitrary shapes, sizes and volumes. Further, high definition bubbles(e.g., 210, 220, 230) can be in fixed positions to capture particularregions of the overall scene that may be of interest (e.g., end zones ina football game). The high definition bubbles (e.g., 240, 250 and 260)may also represent dynamic regions that move to follow action alongarbitrary paths (e.g., 240) or along fixed paths (e.g., 250 to 260).Note also that moving high definition bubbles may sometimes extendoutside the overall viewing area 200 (e.g., 260), though this may resultin FVV image frames in which only the content of that high definitionbubble is visible. One or more high definition bubbles may also overlap(e.g., 230).

FIG. 3 illustrates the use of separate camera arrays to capture a highdefinition bubble 330 using a camera array (e.g., cameras 335, 340, 345and 350) within an overall viewing area 300 that is in turn captured bya set of cameras (e.g., 305, 310, and 315) at a lower fidelity levelthan that of the high definition bubble.

Various embodiments of the Dynamic High Definition Bubble Framework areenabled by using captured image or video data to create a 3Drepresentation (or other visual representation of the “real” world) ofthe overall space of a scene. One or more sub-regions (i.e., highdefinition bubbles) of the larger space of the overall scene are thentransferred to the client as high definition geometric proxies or 3Dmodels while the remaining areas of the overall scene are transferred tothe client using lower definition geometric proxies or 3D models.Advantageously, as noted above, the sub-regions represented by the highdefinition bubbles can be in fixed or predefined positions (e.g., theend zone of football field) or can move within the larger area of theoverall scene (e.g., following a ball or a particular player in a soccergame). These high definition bubbles are enabled by using any desiredcombination of fixed and moving camera arrays to capture high-resolutionimage data within one or more regions of interest relative to the areaor volume of the overall scene.

Consequently, when used to stream both 3D geometric and RGB data fromthe server to the client, the FVV processing techniques enabled by theDynamic High Definition Bubble Framework serve to reduce the amount ofdata used to render a specific viewpoint selected by the user for whenviewing a FVV. This approach is also applicable to server side renderingperformance, when a video frame is generated on the server andtransmitted to the client. In the server side example, using lowerfidelity representations of areas that are far away from a region ofinterest (i.e., the desired viewpoint) in combination with using higherfidelity representations of the regions of interest reduces the time andcomputational overhead needed for generating video frames prior totransmission to the client.

2.2 Exemplary FVV Scenarios:

The Dynamic High Definition Bubble Framework enables a wide variety ofviewing scenarios for clients or users. As noted above, since the useris provided with the opportunity to navigate and zoom the FVV duringplayback, the viewing experience can be substantially different forindividual viewers of the same FVV.

For example, considering a football game in a typical stadium, theDynamic High Definition Bubble Framework uses a number of cameras orcamera arrays to capture sufficient views to create an overall 3D viewof the stadium at low to medium definition or fidelity (i.e., anydesired fidelity level). In addition, the Dynamic High Definition BubbleFramework will also capture one or more specific locations or “bubbles”at a higher definition or fidelity and with a plurality of availableviewpoints. Note that these bubbles are captured using fixed or movablecameras or camera arrays. For example, again considering the footballgame, the Dynamic High Definition Bubble Framework may have fixedcameras or camera arrays around the end zone to capture high definitionimages in these regions at all times. Further, one or more sets ofmoving cameras or camera arrays can follow the ball or particularplayers around the field to capture images of the ball or players frommultiple viewpoints.

Generally, in the case of a football field, it would be difficult tocapture every part of the entire field and all of the action in highdefinition without using very large amounts of data. Consequently, theDynamic High Definition Bubble Framework captures and provides anoverall view of the field by using some number of cameras capturing theoverall field. Then, the Dynamic High Definition Bubble Framework usesone or more sets of cameras that capture the regions around the ball,specific players, etc., so that the overall low definition generalbackground of the football field can be augmented by user navigable highdefinition views of what is going on in 3D in the “bubbles.” In otherwords, in various embodiments, the Dynamic High Definition BubbleFramework generally presents a general or remote view (e.g., relativelyfar back from the action) of an overall volumetric space and then layersor combines navigable high definition bubbles with the overallvolumetric space based on a determination of the proper geometricregistration or alignment of those high definition bubbles within theoverall volumetric space.

In the case of a movie or the like, the Dynamic High Definition BubbleFramework enables the creation of movies where the user is provided withthe capability to move around within a particular scene (i.e., changeviewpoints) and to view particular parts of the scene, which are withinbubbles, in higher definition while the movie is playing.

2.3 Exemplary Data Capture Scenarios and FVV Generation:

The following paragraphs describe various examples of scenariosinvolving the physical placement and geometric configuration of variouscameras and camera arrays within a football stadium to capture multiplehigh definition bubbles and virtual viewpoints for navigation of FVV'sof a football game with associated close-ups and zooms corresponding tothe high definition bubbles and virtual viewpoints. It should beunderstood that the following examples are provided only for purposes ofexplanation and are not intended to limit the scope or use of theDynamic High Definition Bubble Framework to the examples presented, tothe particular camera array configurations or geometries discussed, orto the positioning or use of particular high definition bubbles orvirtual viewpoints.

In general, understanding where cameras or camera arrays will bedeployed and the geometry associated with those cameras determines howthe resulting 3D scene data will be processed in an interactive SpatialVideo (SV) and subsequently rendered to create the FVV for the user orclient. In the case of a typical professional football game, it isassumed that all cameras and related technology for capturing images,following action scenes or the ball, cutting to particular locations orpersons, etc., exists inside or above the stadium. In some cases, thecameras will record elements before the game. In other cases, thecameras will be used in the live broadcast of the game. In this example,there are several primary configurations, including, but not necessarilylimited to the following:

-   -   Asset Arrays—Camera arrays referred to as “asset arrays” are        used to capture 3D image data of players, cheerleaders, coaches,        referees, and any other items or people which may appear on the        field before the game. Following processing of the raw image        data, the output of these asset arrays is both an image        intensive photorealistic rendering and a high fidelity geometric        proxy similar to a CGI asset for any imaged items or people.        This information can then be used in subsequent rendering of the        FVV.    -   Environment Model—Mobile SLR cameras, mobile video cameras,        laser range scanners, etc., are used to build an image-based        geometric proxy for the stadium environment before the game from        3D image data captured by one or more camera arrays. This 3D        image data is then generally used to generate a geometric proxy        or 3D model of the overall environment. Further, this geometric        proxy or 3D model can be edited or modified to suit particular        purposes (e.g., modified to allow dynamic placement of        advertising messages along a stadium wall or other location        during playback of the resulting FVV).    -   Fixed Arrays—Fixed camera arrays are used to capture 3D image        data of various game elements or features for insertion into the        FVV. These elements include, but are not limited to announcers,        ‘talking heads’, player interviews, intra-game fixed physical        locations around the field, etc.    -   Moving Arrays—Mobile camera arrays are used to capture 3D image        data of intra-game action on the field. Note that these are the        same types of mobile cameras that are currently used to record        action in professional football games, though additional numbers        of cameras may be used to capture 3D image data of the        intra-game action. Note that image or video data captured by        fans viewing the game from inside the stadium using cell phones        or other cameras can also be used by the Dynamic High Definition        Bubble Framework to record intra-game action on the field.

2.3.1 Asset Arrays:

In general, “asset arrays” are dense, fixed camera arrays optimized forcreating a static (or moving) geometric proxy of an asset. Assetsinclude any object or person who will be on the field such as players,cheerleaders, referees, footballs, or other equipment. The camerageometry of the asset arrays is optimized for the creation of a highfidelity geometric proxies and that requires a ‘full 360’ arrangement ofsensors so that all aspects of the asset can be recorded and modeled;additional sensors may be placed above or below the assets. Note that insome cases, ‘full 360’ coverage may not be possible (e.g., viewspartially obstructed along some range of viewing directions), and thatin such cases, user selection of viewpoints in the resulting FVV will belimited to whatever viewpoints can be rendered from the captured data.In addition to RGB (or other color space) cameras in the asset array,other sensor combinations such as active IR based stereo (also used inKinect® or time of flight type applications) can be used to assist in 3Dreconstruction. Additional techniques such as the use of green screenbackgrounds can further assist in segmentation of the assets for use increating high fidelity geometric proxies of those assets.

Asset arrays are generally utilized prior to the game and focus onstatic representations of the assets. Once recorded, these assets can beused as SV content for creating FVV's in two different ways, dependingon the degree of geometry employed in their representation usingimage-based rendering (IBR).

Firstly, a low-geometry IBR method, including, but not limited to, viewinterpolation can be used to place the asset (players or cheerleaders)online using technology including, but not limited to, browser-based 2Dor 3D rendering engines. This also allows users to view single assetswith a web browser or the like to navigate around a coordinate systemthat allows them to zoom in to the players (or other assets) from anyangle, thus providing the user or viewer with high levels ofphotorealism with respect to those assets. Again, rendering regionswithin the high definition bubbles using higher resolutions allows theuser to zoom into those regions without losing fidelity or resolution inthe zoomed areas, or otherwise creating pixelization artifacts or otherzoom-based viewing problems. In other implementations, video can be usedto highlight different player/cheerleader promotional activities such athrow, catch, block, cheer, etc. Note that various examples of viewinterpolation and view morphing for such purposes are discussed in theaforementioned U.S. Provisional Patent Application, the subject matterof which is incorporated herein by reference.

Secondly, a high fidelity geometry proxy of the players (or otherpersons such as cheerleaders, referees, coaches, announcers, etc.) iscreated and combined with view dependent texture mapping (VDTM) for usein close up FVV scenarios. To use these geometric proxies in FVV, akinematic model for a human is used as a baseline for possible motionsand further articulated based on RGB data from live-action video cameraarrays. Multi-angle video data is then used to realistically articulatethe geometric proxies for all players or a subset of players on thefield. Advantageously, 6 degrees of freedom (6-DOF) movement of theuser's viewpoint during playback of FVV is possible due the explicit useof 3D geometry in representing the assets. Again, various techniques forrendering and viewing the 3D content of the FVV is discussed in theaforementioned U.S. Provisional Patent Application, the subject matterof which is incorporated herein by reference.

2.3.2 Environment Model:

A model of the environment is useful to the FVV of the football game ina number of different ways, such as providing a calibration frameworkfor live-action moving cameras, creating interstitials effects whentransitioning between known real camera feeds, determining the accurateplacement (i.e., registration or alignment) of various geometric proxies(generated from the high definition bubbles) for FVV, improvingsegmentation results with background data, accurately representing thebackground of the scene using image-based-rendering methods in differentFVV use cases, etc.

As is well known to those skilled in the art, a number of conventionaltechniques exist for modeling the environment using RGB (or other colorspace) photos using a sparse geometric representations of the scene. Forexample, in the case of Photosynth®, sparse geometry means that onlyenough geometry is extracted to enable the alignment of multiplephotographs into a cohesive montage. However, in any scenario, such asthe football game scenario, the Dynamic High Definition Bubble Frameworkprovides richer 3D rendering by using much more geometry. Morespecifically, geometric proxies corresponding to each high definitionbubble are registered or aligned to the geometry of the environmentmodel. Once properly positioned, the various geometric proxies are thenused to render the frames of the FVV.

Traditional environment models are often created using a variety ofsensors such as moving video cameras, fixed cameras for high resolutionstatic images, and laser based range scanning devices. RGB data fromvideo cameras and fixed camera data can be processed using conventional3D reconstruction methods to identify features and their location; pointclouds of the stadium can be created from these features. Additionalgeometry, also in the form of point clouds, can be extracted using rangescanning devices for additional accuracy. Finally, the point cloud datacan be merged together, meshed, and textured into a cohesive geometricmodel. This geometry can also be used as an infrastructure to organizeRGB data for use in other IBR approaches for backgrounds useful for FVVfunctionality.

Similar to the use of asset arrays, an environment model is created andprocessed before being used in any live-action footage provided by theFVV. Various methods associated with FVV live action, as discussedbelow, are made possible by the creation of an environment modelincluding interstitials, moving camera calibration, andgeometry-articulation.

In the simplest use of background models, interstitial movements betweenreal camera positions are enabled, allowing users to more clearlyunderstand where various camera feeds are located. In any SV scenarioinvolving FVV, real camera feeds will have the highest degree ofphotorealism and will be widely utilized. When a viewer elects to changereal camera views—instead of immediately switching to the next videofeed—a smooth and sweeping camera movement is optionally enabled byrendering a virtual transition from the viewpoint of one camera view tothe other to provide additional spatial information about the locationof the cameras relative to the scene.

Additional FVV scenarios make advantageous use of the environment modelby using both fixed and moving camera arrays to enable FVVfunctionality. In the case of moving cameras, these are used to provideclose-ups of action on the field (i.e., by registering or positioninggeometric proxies generated from the high definition bubbles with theenvironment model). To use moving cameras for FVV, individual videoframes are continuously calibrated based on their orientation andoptical focus, as discussed in the aforementioned U.S. ProvisionalPatent Application, the subject matter of which is incorporated hereinby reference.

In general, the Dynamic High Definition Bubble Framework uses structurefrom motion (SFM) based approaches, as discussed in the aforementionedU.S. Provisional Patent Application, the subject matter of which isincorporated herein by reference, to calibrate the moving cameras orcameras based on static high resolution static RGB images capturedduring the environment modeling stage. Finally, for close up FVVfunctionality the Dynamic High Definition Bubble Framework relies uponthe aforementioned articulation of the high-fidelity geometric proxiesfor the assets (players) using data from both fixed and moving cameraarrays. These proxies are then positioned (i.e., registered or aligned)in the correct location on the field by determining where these assetsare located relative to the environment model, as discussed in theaforementioned U.S. Provisional Patent Application, the subject matterof which is incorporated herein by reference.

2.3.3 Fixed Arrays:

Fixed camera arrays are used in various scenarios associated with thefootball game, including intra-game focused footage as well ascollateral footage. The defining characteristic of the fixed arrays arethat cameras do not move relative to the scene.

For example, consider the use of FVV functionality for non-gamecollateral footage—this could include interviews with players orannouncers. Further, consider an announcers stage having a mediumdensity array of fixed RGB video cameras arranged in a 180-degree camerageometry pointing towards the stage for capturing 3D scene data ofpersons and assets on the stage. In this case, the views beingconsidered generally include close-up views of humans, focused on theface, with limited need for full 6-DOF spatial navigation. In this case,an IBR approach such as view interpolation, view morphing, or viewwarping would use a less explicit geometric proxy for the scene, whichwould therefore emphasize photorealism at the expense of viewpointnavigation.

One use of this FVV functionality is that viewers (or producers) canenable real-time smooth pans between the different announcers as theycomment and react. Another application of these ideas is to change viewsbetween the announcers and a top down map of the play presented next tothe announcers. Another example scenario includes zooming in on aspecific cheerleader doing a cheer, assuming that the fixed array ispositioned on the field in an appropriate location for such views. Inthese scenarios, FVV navigation would be primarily limited to syntheticviewpoints between real camera positions or the axis of the camerageometry. However, by using the available 3D scene data for renderingthe image frames, the results would be almost indistinguishable fromreal camera viewpoints.

The intra-game functionality discussed below highlights various benefitsand advantages to the user when using the FVV technology describedherein. For example, consider two classes of fixed arrays, one sparsearray positioned with whole or partial views of the field from highvantage points within the stadium and another where denser fixed cameraare positioned around the actual field such as in the end zone tocapture a high definition bubble of the end zone.

In the case of high vantage point sparse arrays, this video data can beused to enable both far and medium FVV viewpoint control both during thegame and during playback. This is considered a sparse array because therelative volume of the stadium is rather large and the distance betweensensors is high. In this case, image-based rendering methods such asbillboards and articulated billboards may be used to providetwo-dimensional representations of the players on the field. Thesebillboards are created using segmentation approaches, which are enabledpartially by the environment model. These billboards maintain thephotorealistic look of the players, but because they do not include theexplicit geometry of the players (such as when represented as highfidelity geometric proxies). However, it should be understood that ingeneral, navigation in the FVV is independent of the representationused.

Next, denser fixed arrays on the field such as around the end zone forcapturing high definition bubbles allow for highly photorealistviewpoints during both live action and replay. Similar to theannouncer's stage discussed above, viewpoint navigation would be largelyconstrained by the camera axis using similar image-based-renderingmethods described for the announcer's stage. For the most part, thesetypes of viewpoints are specifically enabled when camera density is atan appropriate level and therefore are not generally enabled for alllocations within the stadium. In other words, dense camera arrays areused for capturing sub-regions of the overall stadium as high definitionbubbles for inclusion in the FVV. In general, these methods areunsuitable for medium and sparse configurations of sensors.

2.3.4 Moving Arrays:

Typical intra-game football coverage comes from moving cameras for bothlive action coverage and for replays. The preceding discussion regardingcamera arrays generally focused on creating high fidelity geometricproxies of players and assets, how an environment model of the stadiumcan be leveraged to enhance the FVV, and the use of intra-game fixedcamera arrays in both sparse and dense configurations. The Dynamic HighDefinition Bubble Framework ties these elements together with sparsemoving camera arrays to enable additional FVV functionality for mediumshots using billboards and close-up shots that leverage full 6-DOFspatial navigation using high fidelity geometric proxies of players orother assets or persons using conventional game cameras and cameraoperators. In other words, moving camera arrays are used to capture highdefinition bubbles used in generating FVV's.

Moving cameras in the array are continuously calibrated using SFMapproaches leveraging the environment model. The optical zoomfunctionality of these moving cameras is also used to capture image datawithin high definition bubbles using methods including using priorframes to help further refine or identify a zoomed in camera geometry.Once the individual frames of the moving cameras have been registered tothe geometry of the environment model (i.e., correctly positioned withinthe stadium), additional image-based-rendering methods are enabled fordifferent FVV based on the contributing camera geometries including RGBarticulated geometric proxies with maximal spatial navigation andbillboard methods which emphasize photorealism and less spatialnavigation.

For example, to enable close up replays with full 6-DOF viewpointcontrol during playback, the Dynamic High Definition Bubble Frameworkuses image data from the asset arrays, fixed arrays, and moving arrays.First, the relative position of the players is tracked on the fieldusing one or more fixed arrays. In this way, the approximate location ofany player on the field is known. This allows the Dynamic HighDefinition Bubble Framework to determine which players are in a zoomedin moving camera field of view. Next, based on the identification of theplayers in the zoomed in fields of view, the Dynamic High DefinitionBubble Framework selects the appropriate high-fidelity geometric proxiesfor each player that were created earlier using the asset arrays.

Finally, using a kinematic model for known human motion as well asconventional object recognition techniques applied to RGB video (fromboth fixed and moving cameras), the Dynamic High Definition BubbleFramework determines the spatial orientation of specific players on thefield and articulates their geometric proxies as realistically aspossible. Note that this also helps in filling in occluded areas (usingvarious hole-filling techniques) when there were insufficient numbers orplacements of cameras to capture a view. When the geometric proxies aremapped to their correct location on the field in both space and time,the Dynamic High Definition Bubble Framework then derives a full 6-DOFFVV replay experience for the user. In this way, users or clients canliterally view a play from any potential position including close-upshots as well as intra-field camera positions. Advantageously, the neteffect here is to enable interactive replays similar to what is possiblewith various Xbox® football games such as the “Madden NFL” series ofelectronic games by Electronic Arts Inc, although with real data.

Finally, multiple moving cameras focused on the same physical locationof the field can also enable medium and close up views that use IBRmethods with less explicit geometry such as billboard methodologies.These cameras can be combined with data from both the environment modelas well as the fixed arrays to create additional FVV viewpoints withinthe stadium.

3.0 Operational Summary:

The processes described above with respect to FIG. 1 through FIG. 3 andin further view of the detailed description provided above in Sections 1and 2 are illustrated by the general operational flow diagram of FIG. 4.In particular, FIG. 4 provides an exemplary operational flow diagramthat summarizes the operation of some of the various embodiments of theDynamic High Definition Bubble Framework. Note that FIG. 4 is notintended to be an exhaustive representation of all of the variousembodiments of the Dynamic High Definition Bubble Framework describedherein, and that the embodiments represented in FIG. 4 are provided onlyfor purposes of explanation.

Further, it should be noted that any boxes and interconnections betweenboxes that are represented by broken or dashed lines in FIG. 4 representoptional or alternate embodiments of the Dynamic High Definition BubbleFramework described herein, and that any or all of these optional oralternate embodiments, as described below, may be used in combinationwith other alternate embodiments that are described throughout thisdocument.

In general, as illustrated by FIG. 4, the Dynamic High Definition BubbleFramework begins operation by capturing (410) 3D image data for anoverall viewing area and one or more high definition bubbles within theoverall viewing area. The Dynamic High Definition Bubble Framework thenuses the captured data to generate (420) one or more 3D geometricproxies or models for use in generating a Free Viewpoint Video (FVV).For each FVV, a view frustum for an initial or user selected virtualnavigation viewpoint is then selected (430). The Dynamic High DefinitionBubble Framework then selects (440) an appropriate level of detail forregions in the view frustum based on distance from viewpoint. Further,as discussed herein, the Dynamic High Definition Bubble Framework useshigher fidelity geometric proxies for regions corresponding to highdefinition bubbles and lower fidelity geometric proxies for otherregions of overall viewing area.

The Dynamic High Definition Bubble Framework then provides (450) one ormore clients with 3D geometric proxies corresponding to the viewfrustum, with those geometric proxies having a level of detailsufficient to render the scene (or other objects or people within thecurrent viewpoint) from a viewing frustum corresponding to a userselected virtual navigation viewpoint. Given this data, the FVV isrendered or generated and presented to the user for viewing, with theuser then navigating (460) the FVV by selecting zoom levels and virtualnavigation viewpoints (e.g., pans, tilts, rotations, etc.), which are inturn used to select the view frustum for generating subsequent frames ofthe FVV.

4.0 Exemplary Operating Environments:

The Dynamic High Definition Bubble Framework described herein isoperational within numerous types of general purpose or special purposecomputing system environments or configurations. FIG. 5 illustrates asimplified example of a general-purpose computer system on which variousembodiments and elements of the Dynamic High Definition BubbleFramework, as described herein, may be implemented. It should be notedthat any boxes that are represented by broken or dashed lines in FIG. 5represent alternate embodiments of the simplified computing device, andthat any or all of these alternate embodiments, as described below, maybe used in combination with other alternate embodiments that aredescribed throughout this document.

For example, FIG. 5 shows a general system diagram showing a simplifiedcomputing device such as computer 500. Such computing devices can betypically be found in devices having at least some minimum computationalcapability, including, but not limited to, personal computers, servercomputers, hand-held computing devices, laptop or mobile computers,communications devices such as cell phones and PDA's, multiprocessorsystems, microprocessor-based systems, set top boxes, programmableconsumer electronics, network PCs, minicomputers, mainframe computers,audio or video media players, etc.

To allow a device to implement the Dynamic High Definition BubbleFramework, the device should have a sufficient computational capabilityand system memory to enable basic computational operations. Inparticular, as illustrated by FIG. 5, the computational capability isgenerally illustrated by one or more processing unit(s) 510, and mayalso include one or more GPUs 515, either or both in communication withsystem memory 520. Note that that the processing unit(s) 510 of thegeneral computing device of may be specialized microprocessors, such asa DSP, a VLIW, or other micro-controller, or can be conventional CPUshaving one or more processing cores, including specialized GPU-basedcores in a multi-core CPU.

In addition, the simplified computing device of FIG. 5 may also includeother components, such as, for example, a communications interface 530.The simplified computing device of FIG. 5 may also include one or moreconventional computer input devices 540 (e.g., pointing devices,keyboards, audio input devices, video input devices, haptic inputdevices, devices for receiving wired or wireless data transmissions,etc.). The simplified computing device of FIG. 5 may also include otheroptional components, such as, for example, one or more conventionalcomputer output devices 550 (e.g., display device(s) 555, audio outputdevices, video output devices, devices for transmitting wired orwireless data transmissions, etc.). Note that typical communicationsinterfaces 530, input devices 540, output devices 550, and storagedevices 560 for general-purpose computers are well known to thoseskilled in the art, and will not be described in detail herein.

The simplified computing device of FIG. 5 may also include a variety ofcomputer readable media. Computer readable media can be any availablemedia that can be accessed by computer 500 via storage devices 560 andincludes both volatile and nonvolatile media that is either removable570 and/or non-removable 580, for storage of information such ascomputer-readable or computer-executable instructions, data structures,program modules, or other data. By way of example, and not limitation,computer readable media may comprise computer storage media andcommunication media. Computer storage media includes, but is not limitedto, computer or machine readable media or storage devices such as DVD's,CD's, floppy disks, tape drives, hard drives, optical drives, solidstate memory devices, RAM, ROM, EEPROM, flash memory or other memorytechnology, magnetic cassettes, magnetic tapes, magnetic disk storage,or other magnetic storage devices, or any other device which can be usedto store the desired information and which can be accessed by one ormore computing devices.

Storage of information such as computer-readable or computer-executableinstructions, data structures, program modules, etc., can also beaccomplished by using any of a variety of the aforementionedcommunication media to encode one or more modulated data signals orcarrier waves, or other transport mechanisms or communicationsprotocols, and includes any wired or wireless information deliverymechanism. Note that the terms “modulated data signal” or “carrier wave”generally refer a signal that has one or more of its characteristics setor changed in such a manner as to encode information in the signal. Forexample, communication media includes wired media such as a wirednetwork or direct-wired connection carrying one or more modulated datasignals, and wireless media such as acoustic, RF, infrared, laser, andother wireless media for transmitting and/or receiving one or moremodulated data signals or carrier waves. Combinations of the any of theabove should also be included within the scope of communication media.

Further, software, programs, and/or computer program products embodyingthe some or all of the various embodiments of the Dynamic HighDefinition Bubble Framework described herein, or portions thereof, maybe stored, received, transmitted, or read from any desired combinationof computer or machine readable media or storage devices andcommunication media in the form of computer executable instructions orother data structures.

Finally, the Dynamic High Definition Bubble Framework described hereinmay be further described in the general context of computer-executableinstructions, such as program modules, being executed by a computingdevice. Generally, program modules include routines, programs, objects,components, data structures, etc., that perform particular tasks orimplement particular abstract data types. The embodiments describedherein may also be practiced in distributed computing environments wheretasks are performed by one or more remote processing devices, or withina cloud of one or more devices, that are linked through one or morecommunications networks. In a distributed computing environment, programmodules may be located in both local and remote computer storage mediaincluding media storage devices. Still further, the aforementionedinstructions may be implemented, in part or in whole, as hardware logiccircuits, which may or may not include a processor.

The foregoing description of the Dynamic High Definition BubbleFramework has been presented for the purposes of illustration anddescription. It is not intended to be exhaustive or to limit the claimedsubject matter to the precise form disclosed. Many modifications andvariations are possible in light of the above teaching. Further, itshould be noted that any or all of the aforementioned alternateembodiments may be used in any combination desired to form additionalhybrid embodiments of the Dynamic High Definition Bubble Framework. Itis intended that the scope of the invention be limited not by thisdetailed description, but rather by the claims appended hereto.

What is claimed is:
 1. A computer-implemented process for generatingnavigable free viewpoint video (FVV), comprising using a computer toperform process actions for: generating a geometric proxy from 3D imagedata of an overall volumetric space; generating one or more geometricproxies for each of one or more sub-regions of the overall volumetricspace; registering one or more of the geometric proxies of thesub-regions with the geometric proxy of the overall volumetric space;and rendering a multi-resolution user-navigable FVV from the registeredgeometric proxies and the geometric proxy of the overall volumetricspace, wherein portions of the FVV corresponding to the sub-regions arerendered with a higher resolution than other regions of the FVV.
 2. Thecomputer-implemented process of claim 1 wherein each sub-region iscaptured at a resolution greater than a resolution used to capture theoverall volumetric space.
 3. The computer-implemented process of claim 1wherein one or more of the sub-regions are captured using one or moremoving camera arrays.
 4. The computer-implemented process of claim 1wherein one or more of the sub-regions are captured using one or morefixed camera arrays.
 5. The computer-implemented process of claim 1wherein rendering the multi-resolution user-navigable FVV furthercomprises process actions for: determining a current view frustumcorresponding to a current client viewpoint for viewing the FVV; andtransmitting appropriate geometric proxies within the current viewfrustum to the client for local rendering of video frames of the FVV. 6.The computer-implemented process of claim 1 wherein one or more of thesub-regions move relative to the overall volumetric space during captureof the 3D image data for those sub-regions.
 7. The computer-implementedprocess of claim 1 wherein one or more of the sub-regions overlap withinthe overall volumetric space.
 8. A method for generating a navigable 3Drepresentation of a volumetric space, comprising: capturing 3D imagedata of an overall volumetric space and using this 3D image data toconstruct an environment model comprising a geometric proxy of theoverall volumetric space; capturing 3D image data for one or moresub-regions of the overall volumetric space and generating one or moregeometric proxies of each sub-region; registering one or more of thegeometric proxies of each sub-region to the environment model;determining a view frustum relative to the environment model; andrendering frames of a multi-resolution user-navigable FVV from portionsof the registered geometric proxies and environment model correspondingto the view frustum, wherein portions of the FVV corresponding to thesub-regions are rendered with a higher resolution than other regions ofthe FVV.
 9. The method of claim 8 wherein the view frustum is determinedfrom a current viewpoint of a client viewing the FVV, and wherein therendering is performed by the client from portions of the registeredgeometric proxies and environment model corresponding to the viewfrustum transmitted to the client.
 10. The method of claim 8 whereinzooming into portions of the FVV rendered with a higher resolutionprovides greater detail than when zooming into other regions of the FVV.11. The method of claim 8 wherein each sub-region is captured at aresolution greater than a resolution used to capture the overallvolumetric space.
 12. The method of claim 8 wherein the sub-regions arecaptured using any combination of one or more moving camera arrays andone or more fixed camera arrays.
 13. The method of claim 8 wherein oneor more of the sub-regions move relative to the overall volumetric spaceduring capture of the 3D image data for those sub-regions.
 14. Acomputer-readable medium having computer executable instructions storedtherein for generating a user navigable free viewpoint video (FVV), saidinstructions causing a computing device to execute a method comprising:capturing 3D image data for an overall viewing area; capturing 3D imagedata for one or more high definition bubbles within the overall viewingarea; generating a geometric proxy from the 3D image data of the overallviewing area; generating one or more geometric proxies from the 3D imagedata of one or more of the high definition bubbles; aligning one or moreof the geometric proxies of the high definition bubbles with thegeometric proxy of the overall viewing area; and transmitting portionsof any of the aligned geometric proxies corresponding to a currentclient viewpoint to a client for local client-based rendering of amulti-resolution user-navigable FVV, wherein portions of the FVVcorresponding to the high definition bubbles are rendered with a higherresolution than other regions of the FVV.
 15. The computer-readablemedium of claim 14 wherein each high definition bubble is captured at aresolution greater than a resolution used to capture the overall viewingarea.
 16. The computer-readable medium of claim 14 wherein one or moreof the high definition bubbles are captured using one or more movingcamera arrays.
 17. The computer-readable medium of claim 14 wherein oneor more of high definition bubbles are captured using one or more fixedcamera arrays.
 18. The computer-readable medium of claim 14 whereinrendering the multi-resolution user-navigable FVV further comprises:determining a current view frustum corresponding to a current clientviewpoint for viewing the FVV; and using portions of the alignedgeometric proxies within the current view frustum for local rendering ofvideo frames of the FVV.
 19. The computer-readable medium of claim 14wherein one or more of the high definition bubbles move relative to theoverall viewing area during capture of the 3D image data for those highdefinition bubbles.
 20. The computer-readable medium of claim 14 whereinone or more of the sub-regions overlap within the overall volumetricspace.